Scalable Parallel Scientific Computing Using Twister4Azure

نویسندگان

  • Thilina Gunarathne
  • Bingjing Zhang
  • Tak-Lon Wu
  • Judy Qiu
چکیده

Recent advances in data intensive computing for science discovery are fueling a dramatic growth in use of data-intensive iterative computations. The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure and storage services offers a very attractive environment for scientists to perform data analytics. The challenges to large-scale distributed computations demand new frameworks that are specifically tailored for cloud characteristics in order to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. It extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a wide array of data mining and data analysis applications on the Azure cloud. This paper discusses the applicability of Twister4Azure for scientific computation with highlighted features of fault-tolerance, efficiency and simplicity. We study four data-intensive applications − two iterative scientific applications, Multi-Dimensional Scaling and KMeans Clustering; two data– intensive pleasingly parallel scientific applications, BLAST+ sequence searching and SmithWaterman sequence alignment. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks. We also study and present solutions to several factors that affect the performance of iterative MapReduce appications on Windows Azure Cloud. KeywordsIterative MapReduce, Cloud Computing, HPC, Scientific applications, Azure

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable parallel computing on clouds using Twister4Azure iterative MapReduce

Recent advances in data intensive computing for science discovery are fueling a dramatic growth in the use of dataintensive iterative computations. The utility computing model introduced by cloud computing, combined with the rich set of cloud infrastructure and storage services, offers a very attractive environment in which scientists can perform data analytics. The challenges to large-scale di...

متن کامل

Iterative MapReduce for Azure Cloud

MapReduce distributed data processing architecture has become the de-facto data-intensive analysis mechanism in compute clouds and in commodity clusters, mainly due to its excellent fault tolerance features, scalability, ease of use and the simpler programming model. MapReduceRoles for Azure (MR4Azure) is a decentralized, dynamically scalable MapReduce runtime we developed for Windows Azure Clo...

متن کامل

Parallel computing using MPI and OpenMP on self-configured platform, UMZHPC.

Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...

متن کامل

Scalable Heuristic Algorithms for the Parallel Execution of Data Flow Acyclic Digraphs

Data flow acyclic directed graphs (digraphs) can be applied to accurately describe the data dependency for a wide range of grid-based scientific computing applications ranging from numerical algebra to realistic applications of radiation or neutron transport. The parallel computing of these applications is equivalent to the parallel execution of digraphs. This paper presents a framework of scal...

متن کامل

Green Energy-aware task scheduling using the DVFS technique in Cloud Computing

Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012